Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

نویسندگان

  • Hanne Fersøe
  • Elviira Hartikainen
  • Henk van den Heuvel
  • Giulio Maltese
  • Asunción Moreno
  • Shaunie Shammass
  • Ute Ziegenhain
چکیده

This paper presents specifications and requirements for creation and validation o f large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems . The prepared language resources are created and validated within the scope o f the EU-project LC-STAR (Lexica and Corpora for Speech-toSpeech Translation Components) during years 2002-2005 . Large lexica consisting o f phonetic, suprasegmental and morphosyntactic content will be provided with well-documented specifications for 13 languages . A short summary o f the LC-STAR project itself is presented . Overview about the specification for the corpora collection and word extraction as well as the specification and format o f the lexica are presented . Particular attention is paid to the validation o f the produced lexica and the lessons learnt during pre-validation . The created and validated language resources will be available via ELRA/ELDA .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicon and Corpora for Speech to Speech Translation (LC-STAR)

The objective of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) is corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). During the lifetime of the project (2002-2005) these lexica will be specified, built and validated. Large lexica co...

متن کامل

Large lexica for speech-to-speech translation: from specification to creation

This paper presents the corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). These lexica will be specified, built and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during the years 2002-2005. Large lexic...

متن کامل

LC-STAR II: Starring more Lexica

LC-STAR II is a follow-up project of the EU funded project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Compo­ nents, IST-2001-32216). LC-STAR II develops large lexica containing information for speech processing in ten languages targeting especially automatic speech recognition and text to speech synthesis but also other applications like speech-to-speech translation and taggin...

متن کامل

Lexica and corpora for speech-to-speech translation: a trilingual approach

Creation of lexica and corpora for Catalan, Spanish and US-English is described. A lexicon is being created for speech recognition and synthesis including relevant information. The lexicon contains 50K common words selected to achieve a wide coverage on the chosen domains, and 50K additional entries including special application words, and proper nouns. Furthermore, a large trilingual spontaneo...

متن کامل

Creating Slovenian Language Resources for Development of Speech-to-speech Translation Components

Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language. Lexica exists from three parts: 65.000 common w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004